On Dimensionality Reduction Techniques for Cross-Language Information Retrieval

نویسنده

  • Parth Gupta
چکیده

With the advent of the Web, cross-language information retrieval (CLIR) becomes important not only to satisfy the information need across languages but to mine resources for multiple languages e.g. parallel or comparable documents. Broadly CLIR techniques are of two types, in the first case, either queries or documents are translated to the language of comparison while the other type tries to project the vector space representation of the text to a shared translingual space which represents the “semantics” of the documents. In this study, we review the state-of-the-art for CLIR by means of the latter approach and identify the scope for further research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-language Information Retrieval, Document Alignment and Visualization – A Study with Japanese and Chinese

With the advent of the Internet and digital libraries, as well as the proliferation of multilingual information, sophisticated methods of representation and indexing, and the retrieval of such information is essential. In recent years, the amount of electronically available information has escalated. The non-English information (information in Asian and European languages) is growing rapidly. A...

متن کامل

2D Dimensionality Reduction Methods without Loss

In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...

متن کامل

University of Chicago at CLEF2004: Cross-language Text and Spoken Document Retrieval

The University of Chicago participated in the Cross-Language Evaluation Forum 2004 (CLEF2004) cross-language multilingual, bilingual, and spoken language tracks. Cross-language experiments focused on meeting the challenges of new languages with freely available resources. We found that modest e ectiveness could be achieved with the additional application of pseudo-relevance feedback to overcome...

متن کامل

Japanese-Chinese Cross-Language Information Retrieval: An Interlingua Apporach

Electronically available multilingual information can be divided into two major categories: (1) alphabetic language information (English-like alphabetic languages) and (2) ideographic language information (Chinese-like ideographic languages). The information available in non-English alphabetic languages as well as in ideographic languages (especially, in Japanese and Chinese) is growing at an i...

متن کامل

Learning Curved Multinomial Subfamilies for Natural Language Processing and Information Retrieval

Many problems in natural language learning and information retrieval involve estimating probabilities in very large discrete state spaces. Dimension reduction as well as clustering techniques in various avors have been popular choices to deal with the problem of data sparseness. In this paper, we present a general framework for dimension reduction based on curved multinomial subfamilies. The in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013